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1.   INTRODUCTION 

The  Genesis  (Generator  System)  compiler  generator 
system  allows  the  user  to  generate  language  subsets  indi- 
vidually, then  link  together  those  which  form  a  complete 
language  to  get  a  compiler  for  the  language.   The  language 
subsets  are  called  "segments"  of  the  language.   Segment- 
module  libraries  may  be  built  which  make  the  modules  avail- 
able to  any  language  designer  or  compiler  writer  who  may 
need  them. 

One  segment-module  is  generated  at  a  time  by  a  pro- 
gram called  the  Segment  Generator.   The  segment  syntax  is 
written  in  a  syntactic  meta-language  similar  to  BNF  and  the 
segment  semantics  is  written  in  PL/I.   The  segment-modules 
are  linked  together  by  a  program  called  the  Linker  which 
produces  a  compiler  based  on  a  description  written  in  a 
compiler  description  language. 

This  thesis  describes  the  theoretical  aspects,  the 
algorithms,  and  the  implementation  details  behind  Genesis . 

1 . 1   Computing  Environment 

The  Genesis  compiler  generator  is  written  in  PL/I 
and  runs  on  an  IBM  360/75  running  under  HASP-OS/MVT  with  1M 
bytes  fast  core  and  2M  bytes  slow  core  located  on  the  campus 
of  the  University  of  Illinois  at  Urbana-Champaign.   The 
Segment  Generator  runs  in  128K  bytes  plus  an  input-dependent 


area.   The  Linker  runs  in  108K  bytes  plus  an  input-dependent 
area.   The  compiler  generated  from  the  system  runs  in  48K 
bytes  plus  an  area  for  its  syntax  tables  and  an  area  for  its 
semantic  code . 

1 . 2   Motivation 

Genesis  is  a  tool  which  should  promote  the  "structured 
design"  of  programming  languages,  just  as  "structured"  pro- 
gramming languages  have  promoted  the  structured  design  of 
programs.   With  Genesis ,  a  language  can  be  divided  into  seg- 
ments which  have  one  entry  and  one  exit,  much  as  a  program 
can  be  divided  into  subroutines  which  have  one  entry  and  one 
exit.   This  modularized  language  design  technique  decreases 
the  complexity  of  the  compiler-debugging  task. 

Since  the  segment-modules  are  independent  of  each 
other  and  are  stored  in  a  library,  much  like  a  subroutine 
library,  they  are  available  to  anyone  who  may  need  them. 
Ideally,  segments  which  are  sufficiently  general  could  be 
used  in  many  compilers.   Segments  such  as  one  with  standard 
control  structures,  or  one  with  expressions  containing 
standard  operators  could  be  useful  in  many  languages.   Each 
"standard"  segment  used  by  a  compiler  writer  would  make  the 
task  of  writing  and  debugging  the  compiler  easier.   In  ad- 
dition, the  use  of  "standard"  segments  would  tend  to  stan- 
dardize the  structure  of  the  compiler  and  therefore  would 
make  the  compiler  easier  to  understand  for  someone  familiar 
with  the  standard  components. 


1 . 3   Background 

How  does  Genesis  compare  with  other  compiler  gener- 
ators which  have  been  developed?   Where  does  Genesis  fit  in 
the  wide  spectrum  of  approaches  to  the  construction  of  com- 
piler generators?   To  answer  these  questions,  three  widely 
different  approaches  to  compiler  generation  have  been  selected 
to  compare  and  contrast  with  Genesis .   The  three  are:   XPL 
[McKeeman;  1970],  APAREL  [Balzer;  1969],  and  PGS  [Mickunas; 
1973]. 

XPL  and  APAREL  are  compiler-writing  languages.   PGS 
and  Genesis  rely  on  ordinary  programming  languages  for  the 
actual  implementation  of  a  compiler. 

Of  the  three  systems,  APAREL  is  probably  the  least 
like  Genesis .   Both  the  syntax  and  semantics  of  a  language 
are  expressable  within  the  framework  of  the  APAREL  language. 
The  language  is  an  extension  to  PL/I  which  includes  pattern 
statements  resembling  the  Bachus  Naur  Form  for  specifying 
the  syntax  of  the  language.   While  APAREL  encourages  complete 
mixing  of  syntax  and  semantics  within  a  user-written  com- 
piler, Genesis  provides  nearly  complete  separation  for  them. 
The  principle  behind  this  separation  in  Genesis  is  that  these 
two  dissimilar,  very  complex,  and  often  unrelated  entities 
should  be  studied  individually  before  they  are  studied  to- 
gether.  APAREL  achieves  much  greater  flexibility  for  the 
compiler  because  syntax  patterns  can  be  invoked  at  any  point 
in  the  compiler. 


McKeeman  et  al.  not  only  provide  a  compiler  writing 
language  in  XPL,  but  they  provide  a  compiler  skeleton  as 
well.   SKELETON,  as  it  is  called,  consists  of  a  structure  of 
subroutines  written  in  XPL  which  is  filled  in  by  the  user  to 
form  a  compiler.   In  addition,  the  language  syntax  may  be 
written  in  BNF  and  submitted  to  the  ANALYSER  program  which 
constructs  parse  tables  for  SKELETON.   XPL  differs  from 
Genesis  in  one  major  point — the  compiler  writer  must  know 
the  structure  and  details  of  the  XPL  system  thoroughly  to  be 
able  to  add  his/her  own  code  to  that  of  SKELETON.   Using  that 
knowledge,  the  user  can  tailor  the  lexical  analysis  to  his 
own  special  needs.   One  of  the  prime  goals  in  the  design  of 
Genesis  is  that  a  user  need  not  be  bothered  with  the  details 
of  implementation. 

The  PGS  system  is  the  most  like  Genesis  of  the  three 
systems.   In  PGS,  the  syntax  is  expressed  in  BNF  with  semantic 
tags,  showing  which  semantic  subroutine  is  to  be  executed  at 
each  point  in  the  parse.   Assembler,  Algol,  and  Fortran  sub- 
routines may  be  invoked  as  semantic  routines  in  PGS.   In 
Genesis,  "semantic  numbers"  are  placed  at  various  points 
within  the  syntax  and  at  those  points  the  user-written  PL/I 
semantic  routine  for  the  current  segment  is  called  and  the 
semantic  number  is  passed  to  it. 

1. 4   Organization  of  this  Thesis 

This  thesis  is  organized  in  a  top-down  fashion.   The 
major  ideas  behind  the  system  are  presented  in  this  Introduction. 


In  Section  2,  a  general  overview  of  the  system  plus  a  gen- 
eralized system  diagram  are  presented.   Also,  a  simple 
example  of  segmentation  is  presented.   This  example  will  be 
referenced  throughout  the  paper.   In  Sections  3  through  5, 
the  system  components  will  be  described  in  detail,  the 
theoretical  basis  for  the  system  will  be  stated,  and  the 
system  algorithms  will  be  presented.   Section  6  discusses 
possible  improvements  to  the  system. 


2.   OVERVIEW 

2 . 1   Genesis  Description 

Genesis  consists  of  two  major  programs,  the  Segment 
Generator  and  the  Linker.   The  Segment  Generator  receives  a 
segment  written  in  the  Genesis  syntactic  meta-language 
(syntax)  and  PL/I  (semantics) .   It  then  generates  a  parse 
table  module,  which  it  puts  in  a  parse  table  library,  and 
uses  a  PL/I  compiler  to  produce  a  semantic  object  module, 
which  it  puts  in  the  semantic  library.   These  two  modules 
are  logically  associated  because  they  are  filed  under  the 
same  name  in  each  library.   The  Linker  reads  a  description 
of  the  compiler  which  it  must  build  and  finds  the  appropriate 
segments  in  both  libraries.   The  Linker  produces  three 
modules:   a  compiler  load  module,  a  parse  table  module  and 
a  lexical  scanner  taole  module.   Those  three  modules  together 
form  the  compiler  for  the  language.   Figure  1  shows  the 
Genesis  overall  structure. 

A  special-case  version  of  the  Genesis  system  exists, 
which  generates  an  entire  compiler  in  one  step  from  one 
syntax/semantics  description.   This  version  strips  away  all 
segmentation  overhead  made  unnecessary  since  only  one  "segment" 
exists.   Figure  2  shows  a  block  diagram  of  this  program. 
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Figure  1.   Genesis  Overall  Structure 
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Figure  2.   One-Step  Compiler  Generation  with  Special  Version 
of  Genesis 


2 .  2   Segmentation 

2.2.1   Definitions 

The  word  "segment"  will  appear  many  times  in  this 
thesis.   It  is  important  that  the  term  be  clearly  understood. 
A  language  segment  is  the  combined  syntax-semantics  descrip- 
tion for  a  subset  of  a  language. 

To  distinguish  the  input  to  the  Segment  Generator 
from  its  output,  the  syntactic  meta-language  and  PL/I  descrip- 
tion will  be  called  a  segment,  while  the  syntax  tables  and 
object  code  produced  will  be  called  a  segment-module. 


A  language  which  is  formed  by  the  combination  of  two 
or  more  segments  is  termed  a  "segmented  language."   The  recog- 
nition process  for  a  segmented  language  which  involves  both 
parsing  within  a  segment,  and  moving  between  segments  is 
called  "segmentation  parsing."   The  segmentation  process  has 
little  to  do  with  the  actual  parsing  technique  used  within 
each  segment.   Segmentation  merely  provides  a  superstructure 
within  which  the  parse  is  carried  out.   This  would  make  it 
possible  for  each  segment  to  be  parsed  via  a  different  tech- 
nique. 

For  the  most  part,  this  thesis  will  deal  with  the 
problems  inherent  in  combining  the  syntax  parts  of  several 
segments.   Thus,  for  simplicity,  "segment"  will  be  frequently 
used  in  place  of  "syntax  part  of  a  segment"  throughout  the 
rest  of  this  paper. 


2.2.2   Segment  References 

Within  one  segment,  a  reference  to  another  segment 
looks  exactly  like  the  reference  to  an  ordinary  nonterminal 
symbol.   The  Segment  Generator  can  distinguish  between  the 
two  since  a  segment  reference  is  simply  an  undefined  non- 
terminal within  some  segment. 

For  example,  consider  the  segments  below. 


A 
B 

__ v          rtr>  - 

,,,.,.  \           i  K  •   - 

>             D     , 

Segment  A 

C   - 

\ 

■c'; 

/ 

Segment   C 

10 

Within  Segment  A,  A  and  B  are  nonterminals,  'b'  is  a 
terminal  symbol,  and  C  is  a  segment  reference.   Now,  suppose 
the  input  sentence  'be'  were  to  be  parsed  according  to  the 
syntax  structure  defined  by  segments  A  and  C.   The  'b'  of 
the  input  sentence  would  be  found  to  be  syntactically  correct 
according  to  Segment  A.   To  examine  the  rest  of  the  input 
sentence,  the  segmentation  superstructure  causes  an  entrance 
into  Segment  C  and  then  parsing  continues  within  C.   When  'c' 
has  been  recognized  as  syntactically  correct  within  C,  the 
segmentation  superstructure  leaves  Segment  C  and  returns  to 
Segment  A,  indicating  that  C  has  been  successfully  recognized 


2 . 3   Example 

The  segmentation  concept  may  be  best  illustrated  at 
this  point  by  an  example.   First,  consider  the  grammar  for  a 
very  simple  type  of  assignment  statement  which  is  made  up  of 
three  segments: 


ASSIGN  »  NAME 


EXPR; 


EXPR  >   NAME; 

»   DIGITS; 


NAME 


-»   IDENTIFIER; 


Segment  ASSIGN 


Segment  EXPR 


Segment  NAME 


In  this  example  and  throughout  this  thesis,  the 
special  names  IDENTIFIER,  DIGITS,  and  LITERAL  will  denote 
special  sets  of  lexemes.   IDENTIFIER  is  the  name  of  the  set 
of  lexemes  which  begin  with  a  letter  (A-Z)  and  continue  with 
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either  letters  or  digits  (0-9) .   DIGITS  is  the  name  of  the 
set  of  lexemes  which  are  strings  of  digits.   LITERAL  is  the 
name  of  the  set  of  lexemes  which  begin  with  a  single  quote 
and  end  with  a  single  quote,  with  any  character  in  between. 
An  internal  single  quote  is  represented  by  two  consecutive 
single  quotes. 

This  grammar  describes  a  language  which  allows  sen- 
tences of  two  forms: 

IDENTIFIER  =  IDENTIFIER 

and 
IDENTIFIER  =  DIGITS. 
The  syntax  tables  for  each  of  these  three  segments  would  be 
generated  separately,  then  linked  together.   The  segmentation 
parse  tree  for  the  input  sentence  "A=B"  is  shown  in  Figure  3. 

To  explore  the  power  provided  by  segmentation,  sup- 
pose a  wider  variety  of  names  are  to  be  accepted,  namely 
singly  subscripted  identifiers.   To  accomplish  this,  the  NAME 
segment  would  be  rewritten,  its  syntax  tables  generated  and 
then  linked  with  the  existing  ASSIGN  and  EXPR  segments.   The 
new  NAME  segment  is  shown  in  Figure  4 .   The  EXPR  segment  is 
mentioned  in  the  new  NAT-IE  segment  and  therefore  anything  which 
the  existing  EXPR  segment  accepts  is  immediately  valid  where 
EXPR  appears  in  NAME.   Figure  4  illustrates  this  with  the 
input  sentence  "A(A(5))=B." 

Next,  EXPR  can  be  expanded  to  accept  operators.   The 
new  EXPR  segment  is  shown  in  Figure  5.   When  it  is  generated 
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NAME 


> 


IDENTIFIER; 


GRAMMAR 


ASSIGN 


IDENTIFIER 
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Figure  3.   Segmentation  Parse  Tree  for  "A=B' 
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Figure    4.       Segmentation   Parse    Tree    for    "A(A(5))=B' 
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ASSIGN 


■>   NAME 


EXPR; 


Segment  ASSIGN 


EXPR 


->   EXPR    <  '  *  ' 


->   EXPR    <'  +  ' 


->   NAME ; 
"^   DIGITS; 


EXPR; 
EXPR; 


Segment  EXPR 


NAME 


SUBSCR 


->      IDENTIFIER  SUBSCR; 


-> 


>  '  C 


-^ 


EXPR 


i  \  i  , 


)  •; 


Segment  NAME 


Figure  5.   Example  Segments 
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and  linked  with  the  existing  NAME  and  ASSIGN  segments,  ex- 
pressions throughout  the  language  are  suddenly  richer.   The 
input  sentence  "A (B+3) =B+A (B+2) "  is  acceptable  with  this 
new  set  of  segments. 

While  it  is  true  that  the  grammar  given  in  Figure  5 
for  the  EXPR  segment  is  ambiguous,  it  has  been  shown  [Aho; 
1975]  that  such  a  grammar  plus  some  operator  associativity 
and  precedence  information  is  a  perfectly  valid  expression 
grammar.   In  fact,  the  Genesis  system  accepts  an  expression 
grammar  including  sufficient  associativity  and  precedence 
information  to  remove  ambiguity. 
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3.   SYSTEM  OPERATION 

3 . 1   General 

The  most  striking  thing  about  the  two  major  components 
of  Genesis  (the  Segment  Generator  and  the  Linker)  and  the 
generated  compiler  is  that  they  all  have  exactly  the  same 
structure.   This  brings  about  one  very  important  property 
of  the  system:   new  versions  of  either  the  Segment  Generator 
or  the  Linker  can  be  produced  by  the  existing  system,  which 
would  automatically  generate  the  new  program. 

The  structure  which  the  major  components  of  Genesis 
exhibit  is  shown  in  Figure  6.   This  structure  consists  of  a 
standard  Recognizer  which  accesses  a  parse  table  and  a 
lexical  scanner  table  and  which  calls  user-written  semantics 
routines. 

The  Recognizer  is  made  up  of  a  parser  and  a  lexical 
scanner.   The  parser  accesses  the  parse  table  and  performs 
parse  actions  according  to  an  SLR(l)  parsing  algorithm.   In 
addition,  it  causes  entry  into  and  exit  from  the  various  seg- 
ments in  the  language  being  parsed.   The  lexical  scanner 
accesses  the  lexical  scanner  table  and  performs  actions 
according  to  a  finite  state  machine  algorithm. 

At  various  points  in  the  course  of  parsing  an  input 
program,  the  parse  table  may  indicate  that  a  semantic  action 
must  be  performed.   When  such  a  point  occurs,  the  parser  calls 


SEMANTICS 


Parse 
Table 


Figure  6.   Genesis  Recognizer 
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Shaded 
areas  are 
standard 
components 


the  appropriate  user-written  semantic  routine  which  then 
performs  the  action  and  returns. 


3 . 2   Segment  Generator 

3.2.1   Syntactic  Meta-language 

The  Segment  Generator's  input  is  a  grammar  for  a 
single  segment  written  in  a  syntactic  meta-language.   The 
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major  extensions  beyond  BNF  in  the  meta-language  concern  ways 
of  specifying  certain  Genesis  options  and  constants,  semantic 
information  and  operator  associativity  and  precedence  infor- 
mation within  expressions.   The  full  syntax  of  the  Genesis 
syntactic  meta-language  appears  in  Appendix  A. 

In  this  syntax  notation,  a  blank  left-hand  side  means 
the  same  left-hand  side  as  that  of  the  last  production.   In 
addition,  several  productions  with  identical  left-hand  sides 
may  be  written  with  one  left-hand  side  and  a  series  of  right- 
hand  sides  separated  by  the  alternation  symbol  ( | ) .   Each 
production  or  production-group  with  a  single  left-hand  side 
is  terminated  by  a  semicolon  ( ; ) . 

The  semantic  numbers  are  "associated"  with  the  symbol 
appearing  to  be  the  immediate  right  of  the  number.   If  there 
is  no  associated  symbol  (the  semantic  number  appears  at  the 
far  right-hand  end  of  the  production) ,  then  the  number  is 
associated  with  the  reduction  of  the  entire  production.   When 
the  associated  symbol  has  been  recognized,  the  semantic 
package  indicated  by  the  semantic  number  is  executed. 

The  special  symbols  which  have  to  do  with  expression 
disambiguation  are  left-  or  right-pointing  arrows  ("<"  or 
">").   An  operator  is  identified  by  the  presence  of  a  left- 
arrow  to  its  left  or  a  right-arrow  to  its  right.   The  left- 
pointing  arrow  on  the  left  side  of  a  symbol  denotes  left- 
associativity.   A  right-pointing  arrow  on  the  right  side  of 
an  operator  symbol  denotes. right-associativity.   If  both  a 
left-arrow  and  a  right-arrow  appear  surrounding  a  symbol, 
then  the  symbol  is  thought  of  as  non-associative. 
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The  precedence  of  an  operator  is  determined  by  its 
position  in  the  grammar  compared  with  the  positions  of  the 
other  operators.   The  higher  a  production  appears  on  the  page 
in  a  listing  of  the  grammar,  the  higher  the  precedence  of  its 
operator.   If  the  operator  is  a  nonterminal  symbol,  all  terminal 
symbols  which  that  nonterminal  can  produce  are  defined  to  be 
operators  of  equal  precedence  and  associativity. 

The  expression  grammar  shown  below  contains  six 
operators.   The  exponentiation  ('**')  operator  has  the  highest 
precedence  and  is  right-associative.   The  multiplication  ('*') 
and  division  ('/')  operators  have  equal  precedence  and  are 
both  left-associative.   The  addition  ('+')  and  subtraction 
('-')  operators  are  both  of  equal  precedence  and  of  lower 
precedence  than  exponentiation,  multiplication  and  division. 
The  less-than  operator  ('<')  has  the  lowest  priority  and  is 
non-associative.   This  means  that  an  expression  like  "A<B<C" 
is  not  allowed  according  to  this  grammar. 

EXPR  >   EXPR  '**'  >  EXPR; 


-»   EXPR  <  MULOP  EXPR 


->   EXPR  <  ADDOP  EXPR 


->      EXPR  <  '  <  '  >  EXPR 


MULOP 
ADDOP 


->   IDENTIFIER; 

->   '*'  I  '/'; 


9   '  + 


2', 


3.2.2   Segment  Generator  Semantics 

The  Segment  Generator  performs  four  major  tasks. 
First,  and  foremost,  it  builds  the  parse  table  by  applying 
the  Simple  LR(1)  construction  algorithm  to  the  segment's 
grammar.   While  that  construction  proceeds,  the  Generator 
performs  its  second  major  task,  that  of  keeping  track  of  any 
segment  references  in  the  grammar  and  the  parse  states  in 
which  they  occur.   The  third  task  takes  place  after  the 
construction  algorithm  has  made  one  complete  pass  over  the 
grammar—all  ambiguities  caused  by  an  expression  grammar  are 
resolved.   The  fourth  major  task  is  that  of  compressing  the 
parse  table. 

3  .  3   Linker 

3.3.1   Linker  Input  Language 

The  Linker's  compiler  description  language  can  com- 
pletely describe  a  compiler.   The  language  provides  a  means 
for  setting  various  Linker  constants,  naming  which  segments 
are  to  be  linked  together,  specifying  a  body  of  initialization 
code  which  will  be  executed  in  the  compiler  before  compilation 
begins,  and  selecting  from  among  various  compiler  options. 
The  syntax  of  this  compiler  description  language  appears  in 
Appendix  B. 


3.3.2   Linker  Operation 

The  Linker  performs  six  major  tasks.   First,  it  reads 
the  segments  for  the  compiler  and  stacks  the  parse  tables 
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together  to  form  one  parse  table.   Second,  it  builds  a 
lexeme  cross-reference  table  which  keeps  track  of  the  way 
each  user-defined  lexeme  is  coded  in  every  segment.   Third, 
a  similar  cross-reference  table  is  built  for  the  coding  of 
each  segment  name  in  every  segment.   Fourth,  a  lexical  scanner 
table  is  generated  for  the  compiler.   Fifth,  the  customized 
initialization  routine  and  one  other  customized  routine  are 
generated  and  compiled.   Finally,  the  sixth  task  is  the  link- 
editing  of  the  standard  Recognizer  with  the  customized 
routines  and  the  semantic  routines. 

When  all  of  the  above  has  been  completed  by  the  Linker, 
the  compiler  load  module,  the  parse  table  and  the  lexical 
scanner  table  are  written  to  files  and  are  ready  for  use. 
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4.   THEORETICAL  BASIS 


4 . 1   Choice  of  Parsing  Technique 

Basic  to  the  success  of  the  Genesis  system  is  the 
use  of  a  theoretically  sound  parsing  technique.   The  SLR(l) 
technique  was  chosen.   SLR(l)  parsing  was  first  introduced 
by  F.  L.  DeRemer  in  [DeRemer;  1969]. 

There  are  four  reasons  why  the  SLR(l)  technique  was 
chosen.   First,  SLR  grammars  form  a  large  subset  of  the 
deterministic  context  free  languages.   Most  programming 
languages  in  use  today  can  be  described  by  one  of  the  SLR 
grammars.   Second,  I  personally  find  that  SLR  grammars  are 
more  natural  to  write  than  other  types  of  grammars.   Third, 
SLR  parsers  report  an  error  at  the  earliest  possible  time. 
This  is  not  the  case  for  some  other  parsing  techniques.   For 
instance,  a  precedence  parser  may  examine  an  arbitrary 
number  of  symbols  after  an  error  has  occurred  before  it 
reports  the  error.   Finally,  SLR  parsers  can  be  made  com- 
petitive in  size  and  speed  with  other  parsing  techniques 
through  table  transformations.   [Aho;  1973] 

The  SLR(l)  technique  has  two  big  advantages  over  other 
LR  techniques.   The  computation  time  required  to  generate  its 
parse  table  is  much  less  in  general  for  SLR(l)  and  there  are 
usually  far  fewer  table  entries  with  SLR(l)  than  with  other 
LR  techniques.   [Aho;  1973] 
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4 . 2   SLR(l)  Parsing  and  Table  Construction 

The  classic  model  machine  for  deterministic  context 
free  language  recognition  is  the  deterministic  push  down 
automaton  (DPDA) .   This  machine  is  sufficient  for  SLR(l) 
parsing,  but  is  not  quite  sophisticated  enough  for  effi- 
cient SLR(l)  parsing.   The  machine  chosen  to  model  SLR(l) 
parsing  efficiently  for  Genesis  is  called  the  Genesis  DPDA. 

The  Genesis  DPDA  consists  of  three  elements:   an 
input  tape  (where  the  input  program  comes  from) ;  a  finite 
state  control  (which  controls  the  machine's  actions);  and 
a  push  down  stack.   The  machine's  current  condition  is 
characterized  by  its  "state"  (which  is  actually  the  state  of 
the  finite  control) . 

The  Genesis  DPDA  can  carry  out  six  actions:   SHIFT 
an  input  symbol  onto  the  stack;  REDUCE  a  meta-language  pro- 
duction by  taking  its  right-hand  side  off  the  stack  and 
putting  its  left-hand  side  on  the  stack;  ACCEPT  the  input 
string;  report  an  ERROR;  ENTER  another  Genesis  DPDA;  and  EXIT 
the  current  Genesis  DPDA.   Embedded  in  three  of  the  six 
actions  is  the  possible  execution  of  semantic  routines. 
After  the  input  symbol  has  been  SHIFTED  onto  the  stack,  a 
semantic  routine  can  be  executed.   Within  REDUCE,  both  after 
the  right-hand  side  has  been  taken  off  the  stack,  and  after 
the  left-hand  side  has  been  put  on  the  stack,  semantic  rou- 
tines can  be  executed.   Within  ACCEPT,  after  the  right-hand 
side  of  production  1  has  been  removed  from  the  stack,  some 
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semantic  routine  may  be  executed.   The  inclusion  of  these 
semantic  actions  makes  the  Genesis  DPDA  a  translating 
machine  instead  of  just  a  parsing  machine.   The  ENTER  and 
EXIT  actions  implement  segmentation  parsing. 

The  Genesis  DPDA  effectively  models  the  parse  tree 
for  an  input  sentence.   The  SHIFT  action  gets  input  symbols 
onto  the  stack,  then  the  REDUCE  action  has  the  effect  of 
replacing  them  with  a  single  nonterminal  symbol  (their 
father  node  in  a  parse  tree) .   That  nonterminal  can  then  be 
one  of  the  symbols  replaced  by  another  REDUCE,  and  the 
process  continues  until  the  sentence  symbol  alone  is  left 
on  the  stack,  at  which  point  the  input  sentence  is  ACCEPTED. 

As  long  as  the  finite  control  causes  the  proper  ac- 
tions at  the  proper  times,  the  Genesis  DPDA  can  model  any 
deterministic  context-free  language  parse.   The  finite 
control  built  for  Genesis  is  for  SLR(l)  grammars. 

4 .3   Basis  for  the  SLR(l)  Construction 

The  SLR(l)  table  construction  is  done  by  building  a 
series  of  LR(0)  items,  each  of  which  results  in  one  parse 
action  entry  in  the  SLR(l)  table.   The  LR(0)  items  are  built 
from  an  SLR(l)  grammar  by  moving  a  cursor  through  the  pro- 
ductions of  the  grammar  to  all  the  possible  points  which  the 
parse  of  an  input  sentence  might  reach. 

Each  LR(0)  item  has  the  form 

[A  +  a  •  3] 
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where  A  is  a  nonterminal  in  the   grammar,  a  and  3  could  be  non- 
terminals, terminals,  or  A  (the  empty  symbol),  and  A  ■*■  a3 
is  a  production  in  the  grammar.   The  cursor  is  represented  by 
a  period  ( . ) .   An  item  represents  a  parse  that  has  reached 
the  point  between  a  and  3. 

Each  set  of  LR(0)  items  is  called  a  state.   For  each 
state  there  is  a  collection  of  valid  terminal  symbols  which 
will  cause  the  parse  to  continue.   These  symbols  are  each 
associated  with  a  parse  action.   The  procedure  for  determining 
the  parse  actions  is  described  in  section  5.3.4. 

4 .4   Formal  SLR(l)  Definitions 

A  grammar  is  expressed  as  an  ordered  4-tuple  G= 
(N,T,P,S) . 

N  is  the  set  of  all  nonterminals  in  the  grammar. 
T  is  the  set  of  all  terminals  in  the  grammar. 
P  is  the  set  of  all  productions  in  the  grammar. 
S  is  the  Start  symbol  of  the  grammar.   SeN. 
The  symbol  "=y  means  "produces."   A  grammar  is  used  to  derive 

the  string  on  the  right  of  "4"  from  the 
string  on  the  left. 
"=t>  "  means  "produces  in  zero  or  more  steps." 
"^  rightmost"  means  "produces  in  zero  or  more 
steps  using  a  right-most  derivation." 
The  EFF1  (Epsilon-Free  First)  and  FOLLOW,  sets  will  be 
used  in  the  definition  of  SLR(l)  grammars.   The  FIRST   set  will 
be  used  to  define  EFF,  . 
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FIRST1  (a) 
EFF.,  (a)  = 


FOLLOV^  (8) 


(x|a  4x6  and  length  (x)  =  1} 
FIRST.,  (a)  if  a  does  not  begin  with  a 
nonterminal ; 
or 
{w|w  is  in  FIRST,  (a)  and  there  is 

a  derivation 

* 
a  ^     8  =^  wx 

right 

most 

where  3  ^  Awx  for  any 

nonterminal  A} 

* 
=  {x|S  ^  a&Y  and  x  is  in  FIRST,  (y) } 


SLR(l)  grammars  are  defined  as  follows: 
Let  G=(N,T,P,S)  be  a  context  free  grammar  (not  neces- 
sarily LR(0)).   Let  S   be  the  collection  of  sets  of  LR(0)  items 
for  G.   Let  Q  be  any  set  of  items  in  S  .   Suppose  that  whenever 
[A-*a.B]  and  [B-^y.5]  are  two  distinct  items  in  Q,  one  of  the 
following  conditions  is  satisfied: 

(1)  Neither  of  8  and  6  are  A. 

(2)  B^A,   <5  =  A  and  FOLLOW.  (B)  C\     EFF1  (6  FOLLOV^  (A))  =  <J> 

(3)  8  =  A,   6^A  and  FOLLOV^  (A)  f\    EFF   (6  FOLLOV^  (B)  )  =  4> 

(4)  3=A,   6  =  A  and  FOLLOW-j^  (A)  f\    FOLLOW.  (B)  =  <f> 
Then  G  is  said  to  be  a  simple  LR(1)  grammar  (SLR(l)  grammar). 


4 . 5   The  Combinability  Criterion 
4.5.1   Definitions 

Each  segment  undergoes  SLR(l)  table  generation.   For 
each  state  in  a  segment's  tables,  the  Segment  Generator  keeps 
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track  of  the  set  of  lexemes  which  cause  a  correct  parse  to 
continue.   This  set  is  called  the  continue-set  for  the  state. 
The  names  of  all  segments  which  could  be  entered  from  each 
state  are  also  kept  on  a  segment-list  for  that  state. 

Each  segment  has  a  single  entry  point — its  first 
state.   The  continue-set  for  a  segment's  first  state  will  be 
called  the  segment's  seed-set. 

For  instance,  referring  back  to  the  example  in  section 
2.3,  and  Figure  8  below,  the  seed-set  of  ASSIGN  is  the  set 
IDENTIFIER  of  identifiers,  the  seed-set  of  NAME  is  the  set 
IDENTIFIER,  and  the  seed-set  of  EXPR  is  the  set  IDENTIFIER  \J 
DIGITS.   The  first  state  in  ASSIGN'S  parse  table  lists  NAME  as 
a  segment  which  could  be  entered.   State  3  in  ASSIGN  has  "=" 
in  its  continue-set  and  nothing  in  its  enterable-segment  list. 

4.5.2   The  Criterion 

Regardless  of  the  parsing  technique  used  for  any  of 
the  segments,  there  is  a  general  criterion  for  deterministic 
segmentation  parsing.   Intuitively,  the  Combinability  Criterion 
states  that  when  all  segments  are  combined  to  form  a  language, 
a  fixed  number  of  symbols  will  be  sufficient  to  determine  the 
next  parsing  action  or  the  next  entry  into  or  exit  from  a 
segment.   Genesis  uses  one  symbol  to  make  that  decision. 

Each  state  in  each  segment  has  a  possibly-empty 
continue-set  C.   If  that  state's  segment-list  is  not  empty, 
then  associated  with  that  state  is  a  collection  S  of  seed-sets. 
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The  algorithm  to  determine  the  members  of  collection 
S  is: 

1.  Copy  the  state's  segment-list  into  membership  list  M. 

2.  Examine  each  element  of  M,  beginning  with  the  first. 
If  that  segment  has  a  segment-list  in  its  first 
state,  merge  that  list  into  M  (if  a  member  of  the 
segment-list  is  already  in  M,  don't  do  anything  for 
that  member) . 

3.  Continue  until  there  are  no  more  elements  in  M  to 
examine. 

4.  The  group  of  seed-sets  of  all  segments  named  in  M 
is  the  collection  S. 

The  combinability  criterion  is  that  C  and  all  sets  in 
S  must  be  collectively  disjoint.   Let  I  be  an  index  set  for  S. 
Then, 

C  0  S.  =   <j>        (Vi  el) 
and 

S±  A  S.  =  4>       (Vi,  j  €  13  i*j) 

where  <j>  is  the  empty  set  and  S.  is  the  i '  th  set  in  S. 

Figure  7  shows  that  the  three  example  segments  meet 
the  Combinability  Criterion. 
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SEED 
SET: 


ASSIGN 


IDENTIFIER 


NAME 


stat  i» 


Continue 
set 


EOF 


EOF 


Enterable 
segments 


NAME 


EXPR 


Combinability 
Check 


{}fl{ldentifier} 

=4> 


n    r      Digits,    i 


{}A(TJDi8^sf  } 
Identifier 


IDENTIFIER 

Continue 
state     set 

Enterable 
segments 

1 

IDENTIFIER 

2 

EOF 

3 

EOF 

4 

EOF 

5 

EXPR 

6 

) 

7 

EOF 

EXPR 


SEED  SET: 


DIGITS,  IDENTIFIER 

3tat< 

Continue 
set 

Enterable 
segments 

1 

DIGITS 

NAME 

2 

EOF 

3 

* 

+ 

4 

+ 
EOF 

5 

i'c 

EOF 

6 

DIGITS 

NAME 

7 

DIGITS 

NAME 

8 

+ 

EOF 

9 

+ 
EOF 

Combinability 

Check 

{DIGITSlAt IDENTIFIER} 


{DIGITS }A{ identifier} 
{DIGITSMi  identifier} 

=4 


EOF:   End  of  File 


Figure  7.   Combinability  Criterion  Applied  to  Example  Segments 
(Appendix  C  contains  the  complete  construction  of 
the  parse  tables  for  these  segments.) 


30 


5.   ALGORITHM  DESCRIPTION 


5 . 1   Parser 

The  Genesis  DPDA,  discussed  in  section  4.2,  is  the 
model  for  parsing  used  in  the  Recognizer,  but  the  algorithm 
used  to  implement  it  makes  some  necessary  modifications  to 
it.   The  two  actions  ENTER  and  EXIT  do  not  exist  at  all  in 
the  Recognizer  algorithm.   They  are  replaced  by  extra  code 
in  the  ACCEPT  and  ERROR  actions.   The  reason  for  this  change 
is  one  of  efficiency.   The  ENTER  and  EXIT  instructions  can- 
not be  generated  by  the  Segment  Generator  since  it  cannot 
know  the  seed-set  for  segments  to  be  entered  and  since  it 
cannot  know  whether  the  segment  being  generated  will  be  the 
major  segment  of  a  language  (the  major  segment  is  the  segment 
whose  sentence  symbol  becomes  the  sentence  symbol  of  the 
entire  language).   The  Linker  knows  both  of  these  things, 
but  for  efficiency,  does  not  alter  the  parse  tables  of  the 
segments  which  it  is  linking. 

The  Parser  has  four  actions:  SHIFT,  REDUCE,  ACCEPT, 
and  ERROR.   SHIFT  pushes  an  input  symbol  on  top  of  the  parse 
stack,  then  pushes  the  next  parse  state  number  onto  the  stack 
and  causes  a  transfer  to  that  state.   A  new  input  symbol  is 
then  read. 

REDUCE  causes  twice  the  number  of  symbols  which  are 
on  the  right-side  of  a  production  to  be  removed  from  the 
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stack  (one  for  each  right-hand  side  symbol  plus  one  for  each 
state  number  pushed  on).   Then,  the  top  of  the  stack  will 
hold  a  new  state  number  which  becomes  the  current  state. 
The  left-hand  side  nonterminal  symbol  is  pushed  onto  the 
stack,  and  a  table  called  the  GOTO  table  is  consulted  to  see 
what  the  next  parse  state  will  be.   That  next  state  is  pushed 
onto  the  parse  stack.   The  method  for  constructing  the  GOTO 
table  will  be  described  in  section  5.3.5. 

ERROR  causes  the  machine  to  look  in  the  segment-list 
for  the  current  parse  state  and  segment.   If  a  segment  name 
is  on  the  list,  that  segment  is  entered  by  placing  the  cur- 
rent segment  number  on  the  stack,  placing  state  number  one  on 
the  stack  (first  state  of  the  new  segment) ,  then  changing 
the  current  segment  number  to  the  new  segment  number.   If 
the  segment-list  for  the  current  state  and  segment  is  empty, 
an  ERROR  is  signalled  and  the  parse  stops. 

Thus,  ERROR  simulates  the  ENTER  action  of  the  Genesis 
DPDA  in  some  cases.   The  Genesis  Linker  only  allows  one  segment 
to  be  enterable  from  any  one  state  and  it  does  not  record  the 
seed-set  for  the  enterable  segment.   Whenever  an  error  occurs 
and  a  segment  is  enterable,  that  segment  is  entered. 

To  avoid  a  possible  infinite  loop  where  a  particular 
lexeme  does  not  appear  in  the  seed-set  of  any  of  a  circular 
path  of  enterable  segments,  a  run-time  check  must  be  made  to 
make  sure  that  no  segment  is  entered  for  a  second  time  for 
the  same  input  symbol.   For  example,  Figure  8  shows  this 
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Figure  8.   Potential  Infinite  Loop  Situation  for  Segment  Entrance 


type  of  situation.   If  the  symbol  ";"  (not  in  the  seed-set 
of  either  segment)  is  examined  in  the  first  state  of  segment 
A,  the  parser  would  find  ERROR  as  its  action,  then  notice  it 
could  enter  B,  which  would  find  the  same  situation  and  re- 
enter A.   An  infinite  loop  would  result  if  it  were  not 
stopped.   A  run-time  check  would  stop  execution  with  an  ERROR 
when  A  was  entered  for  the  second  time  without  ";"  being 
consumed. 

If  the  full  Genesis  DPDA  were  implemented  and  if  the 
Linker  implemented  the  full  Combinability  Criterion  by 
altering  the  segments'  parse  tables  with  ENTER  and  EXIT 
actions,  the  run-time  check  would  be  unnecessary. 

ACCEPT  causes  an  "input  accepted"  message  if  the 
current  segment  is  the  major  segment.   If  it  is  not,  then  the 
action  is  treated  just  as  if  it  were  a  REDUCE  for  production 
number  one  of  the  segment.   After  the  right-hand  side  of 
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production  one  is  removed  from  the  stack,  the  last  segment 
name  is  removed  from  the  stack.   It  is  changed  to  a  non- 
terminal name  and  the  GOTO  table  is  consulted  for  the  next 
parser  state.   The  nonterminal  name  is  pushed  onto  the  stack, 
then  so  is  the  next  parse  state. 

Accept  models  the  EXIT  action  of  the  Genesis  DPDA 
exactly  for  all  minor  segments  (all  segments  are  minor, 
except  for  the  major  segment) . 

5 . 2   Lexical  Scanner 

5.2.1   General  Description 

The  Genesis  lexical  scanner  reads  the  input  program 
and  converts  it  to  tokens  which  it  passes  to  the  parser.   The 
token  conversion  process  is  guided  by  a  lexical  scanner  table. 

The  scanner  recognizes  lexemes  of  one  or  more  charac- 
ters as  well  as  identifiers,  digit  strings,  and  literals. 
The  latter  are  recorded  in  special  tables.   Comments  are  recog- 
nized, then  discarded. 

The  model  machine  for  the  scanner  is  a  finite  state 
machine.   The  current  input  symbol  together  with  the  current 
state  uniquely  determines  the  actions  which  the  machine  will 
take.   The  actions  are  of  four  different  types.   Actions  of 
type  1  and  type  2  are  carried  out  with  every  state  transition. 
Sometimes  three  and  possibly  all  four  types  are  carried  out 
during  one  state  transition. 
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5.2.2   Lexical  Scanner  Actions 
Type  1 : 

(a)   CONSUME  INPUT  SYMBOL 
or  (b)   DON'T  CONSUME  INPUT  SYMBOL 
The  machine  can  choose  to  go  to  the  next  input  symbol  or  not. 
Type  2 : 

(a)   TRANSFER  TO  <STATE> 
This  changes  the  current  scanner  state  to  <STATE>. 
The  scanner  starts  in  state  1  when  called.   For  every  state 
transition,  the  state  is  re-assigned  with  this  action. 
Type  3:   (optional) 

(a)   INDICATE  THAT  <LEXEME>  IS  RECOGNIZED 
or  (b)   INDICATE  THAT  <LEXEME>  MIGHT  BE  RECOGNIZED 
or  (c)   DENY  PREVIOUS  POSSIBLE  LEXEME 
or  (d)   CONFIRM  PREVIOUS  POSSIBLE  LEXEME 
When  a  lexeme  has  been  recognized,  either  (a)  or  (d) 
will  signal  that  fact.   Whenever  (b)  is  performed,  one  of  (c) 
or  (d)  will  be  performed  on  the  next  state  transition.   If 
(c)  is  executed,  the  "possible"  lexeme  will  be  declared  a 
false  alarm  and  the  machine  will  continue  until  one  of  (a) 
or  (d)  is  executed. 

Actions  (b) ,  (c)  and  (d)  are  necessary  since  some 
user-defined  lexemes  have  the  same  structure  as  identifiers 
(like  "BEGIN,"  "END,"  etc.). 

If  "BEGIN"  were  a  user-defined  lexeme,  then  when 
"BEGIN"  is  seen  on  input,  (b)  is  executed.   Action  (a)  is 
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not  possible  in  this  case  because  the  next  symbol  will  de- 
termine whether  the  lexeme  is  "BEGIN,"  or  possibly  some 
identifier  like  "BEGINNING."   If  the  next  character  after 
"BEGIN"  is  a  legal  continuation  for  an  identifier,  then  (c) 
is  executed.   If  the  next  character  is  not  a  legal  continua- 
tion for  an  identifier,  then  (d)  is  executed. 

Type  4:   (optional) 

(a)   ENTER  <SYMBOL>  IN  NAME  TABLE 
or  (b)'  ENTER  <SYMBOL>  IN  LITERAL  TABLE 
or  (c)   ENTER  <SYMBOL>  IN  NUMBER  TABLE 

This  action  enters  the  input  string  into  a  table 
according  to  the  type  of  symbol  it  has  found.   A  global 
variable  is  set  with  the  symbol's  position  in  the  table,  so 
that  the  user-written  semantics  can  access  the  symbol. 

When  the  scanner  reaches  the  end  of  the  input  file, 
the  scanner  immediately  sets  the  recognized  symbol  to  be 
<END  OF  FILE>,  then  returns. 

5 . 3   Segment  Generator 
5.3.1   General 

Before  the  Segment  Generator  algorithms  are  described, 
a  crucial  term  must  be  defined.   The  core  of  a  parse  state  is 
the  set  of  LR(0)  items  which  exist  in  that  state  before 
closure  is  performed  on  that  state.   Closure  is  a  process 
which  generates  all  additional  relevant  LR(0)  items  for  a 
state  from  the  core  item  of  that  state.   Closure  will  be 
described  in  greater  detail  later. 
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5.3.2  Generating  the  Parse  Table 

The  parse  table  generation  algorithm  begins  with  the 
core  of  the  first  state  being  set  to  the  item: 

[S  -  .  a) 
where  S  is  the  sentence  symbol  and  S  ■*■   a    is  the  first  produc- 
tion in  the  segment's  grammar. 

The  algorithm  first  performs  closure  on  the  core  of 
the  current  state.   Then,  for  each  item  in  the  state,  it  com- 
putes a  parse  action  or  a  GOTO  action  and  possibly  generates 
a  core  item  for  some  new  state  in  a  temporary  state  area. 

When  all  items  in  the  current  state  are  processed,  a 
set  of  cores  for  new  states  will  have  been  generated  in  the 
temporary  state  area.   This  set  of  temporary  cores  is  then 
merged  with  all  the  existing  states.   If  a  temporary  core 
matches  the  core  of  an  existing  state,  all  references  to  the 
temporary  state  are  changed  to  refer  to  the  existing  state. 
If  the  temporary  core  does  not  match  any  existing  core,  then 
that  temporary  core  is  added  to  the  list  of  existing  cores. 

When  all  temporary  cores  have  been  merged  into  the 
existing  cores,  the  next  successive  existing  state  is 
processed  (closure  is  performed,  parse  actions  computed,  and 
temporary  states  generated,  then  merged) .   This  process  con- 
tinues until  all  existing  states  are  processed. 

5.3.3  Closure 

Closure  is  performed  by  looking  at  each  LR(0)  item 
in  the  state  (both  core  items  and  any  generated  items)  in 
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turn.   If  the  grammar  symbol  to  the  right  of  the  cursor  is  a 
terminal  symbol,  then  nothing  is  done.   If  the  symbol  is  a 
nonterminal  A,  then  the  state  is  augmented  by  all  items  of 
the  form 

[A  -*•  .  a] 
such  that  A  ■*  a  is  a  production  in  the  grammar. 

5.3.4  Computing  a  Parse  Action 

In  a  state  X,  an  item  of  the  form 

[A  -»■  a  .  b  3] 
where  b  is  a  terminal  symbol,  produces  a  parse  action  of 
SHIFT,  the  creation  of  a  core  item  for  a  new  state  Y  of  the 
form 

[A  -*•  a  b  .  B] 
and  an  indication  to  transfer  to  that  new  state  after  the 
SHIFT.   Thus,  whenever  the  terminal  b  is  reported  by  the 
lexical  scanner  and  the  parse  is  in  state  X,  the  parser  will 
SHIFT  the  b  onto  the  stack  and  transfer  to  parse  state  Y. 

An  item  of  the  form 

[A  ■+  a  .  ] 
produces  a  parse  action  of  REDUCE  for  all  terminal  symbols  in 
the  follow  set  F  of  symbol  A.   Whenever  a  terminal  symbol  in 
F  is  reported,  REDUCE  is  executed  for  the  production. 

5.3.5  Forming  the  GOTO  Table 

In  a  state  X,  an  item  of  the  form 

[A  -*-  a  -.    B  33 
where  B  is  the  nonterminal,  causes  the  creation  of  a  core  item 
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for  a  new  state  Y  of  the  form 

[A  ■*■   a   B  .  g] 
and  produces  a  GOTO  table  entry  to  transfer  to  state  Y. 
Several  GOTO  table  entries  could  exist  for  each 
state,  but  only  one  for  each  unique  nonterminal  appearing  to 
the  immediate  right  of  the  cursor  in  that  state. 

5.3.6   Parse  State  Conflicts 

A  conflict  occurs  in  a  parse  state  for  a  grammar  when 
more  than  one  parse  action  is  possible  for  a  particular 
terminal  symbol.   If  this  occurs,  the  grammar  is  not  an  SLR(l) 
grammar. . 

For  instance,  consider  the  following  parse  state  in 
the  SLR(l)  construction  for  the  example  segment  EXPR: 

State  X 


[EXPR  ■+  EXPR  '*'  EXPR  .J 


on 


•  *  i 


REDUCE 


on  '+'  REDUCE  (conflict) 
on  <end  of  file>  REDUCE 

[EXPR  ->  EXPR  .  '  +  '  EXPR] 
on  '+'  SHIFT  (conflict) 


(the  followset  of  EXPR  is  {'**,  »+' ,    <end  of  file>}) 

The  symbol  "+"  could  trigger  two  possible  actions, 
SHIFT  and  REDUCE.   This  is  called  a  SHIFT-REDUCE  conflict. 


It  shows  that  EXPR  is  not  in  SLR(l)  form. 
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Some  SHIFT-REDUCE  conflicts  can  be  resolved  by  Genesis 
Specifically,  ambiguous  expression  grammars  like  the  EXPR 
grammar  above  can  be  disambiguated  if  operator  precedence  and 
associativity  information  is  included  in  the  grammar. 

5.3.7   Disambiguating  Expressions 

(Refer  back  to  section  3.2  for  the  format  of  operator 
associativity  and  precedence  information  used  for  Genesis. ) 

Conflicts  in  the  states  of  an  expression  grammar 
parse  are  of  two  kinds.   The  first  is  in  a  state  with  the 
following  two  types  of  items: 

[EXPR  ■+  EXPR  Op   EXPR  .] 
[EXPR  ■*-  EXPR  .  0p2  EXPR] 
"Op,"  and  "0po"  denote  different  operators.   When  two  differ- 
ent  operators  are  involved  in  the  conflict,  as  above,  the 
operator  with  the  highest  precedence  has  control.   In  the 
above  situation,  if  Op,  had  higher  or  equal  precedence,  the 
action  would  be  REDUCE.   If  0p~  had  higher  precedence,  the 
action  would  be  SHIFT. 

The  second  kind  of  conflict  is  one  with  the  following 
two  types  of  items  in  one  state: 

[EXPR  ->  EXPR  Op,  EXPR  .] 

[EXPR  ■+  EXPR  .  Op,  EXPR] 

This  conflict  is  between  two  identical  operators. 

The  associativity  of  the  operator  determines  the  parse  action. 

If  the  operator  is  left-associative,  then  REDUCE  is  the  proper 

action.   If  the  operator  is  right-associative,  then  SHIFT  is 
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the  proper  action.   If  the  operator  is  non-associative,  then 
ERROR  is  the  proper  action. 

5 . 4   Linker 

5.4.1   General 

The  Linker  program- performs  the  following  six  major 
tasks . 

(1)  Parse  Table  Construction 

The  Parse  and  GOTO  table  construction  is  done  by 
stacking  the  parse  and  GOTO  tables  for  all  the  segments 
together.   A  segment  index  is  built  which  shows  where  each 
segment's  tables  begin.   Figure  9  shows  the  structure  of  the 
combined  parse  tables.   Section  5.5.2  discusses  the  Segment 
Generator's  algorithm  for  producing  the  individual  compressed 
parse  tables. 

(2)  Terminal  Symbol  Cross-Reference 

The  terminal  symbols  from  all  the  segments  are  merged 
into  one  list  with  one  entry  for  each  unique  terminal  symbol. 
Then,  for  each  entry  in  that  terminal  symbol  list,  a  terminal 
capsule  is  formed.   A  terminal  capsule  lists  the  numeric  code 
used  for  a  certain  terminal  in  each  segment.   None  of  the  parse 
table  entries  need  to  be  changed,  even  though  one  terminal 
symbol  is  represented  by  different  codes  in  different  seg- 
ments,  since  the  proper  code  for  the  symbol  in  any  segment 
can  always  be  found  in  its  capsule. 


41 
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Figure  9.   Parse  Table  Structure  for  Linked  Segments 

(3)   Segment  Cross-Reference 

Segment  references  are  treated  as  undefined  non- 
terminals in  each  segment.   Each  segment  name,  therefore,  is 
represented  by  some  numeric  code  in  at  least  one  other  seg- 
ment.  The  codes  for  each  segment  name  in  all  segments  are 
kept  together  in  a  table  similar  to  the  terminal  cross- 
reference  table. 


(4)   Lexical  Scanner  Table  Generation 

The  Finite  State  Machine  Generator  program  accepts 
a  list  of  user-defined  lexemes  plus  several  constants  and 
option-selections  as  input.   Its  output  is  a  FSM  table  which 
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can  be  used  to  recognize  any  of  the  user-defined  lexemes  plus 
any  identifiers,  literals,  digit  strings,  and  comments. 

Before  the  table  generation  process  begins,  the  list 
of  user-defined  lexemes  is  sorted  by  length,  longest  first. 
Then  the  list  is  partitioned  into  several  classes.   The 
lexemes  are  classified  according  to  the  first  character  of 
each.   The  classes  are:   Alphabetic  (A-Z),  Blank,  and 
Special  ("*",  "-",  "$",  etc.).   No  user-defined  lexeme  can 
begin  with  a  digit. 

The  various  parts  of  the  table  can  be  built  in  any 
order.   The  parts  which  have  to  be  built  are  one  each  for: 
identifiers,  literals,  digit  strings,  comments,  those  lexemes 
beginning  with  a  blank,  those  beginning  with  an  alphabetic 
character,  and  those  beginning  with  a  special  character. 

The  parts  of  the  table  built  from  the  user-defined 
lexemes  are  constructed  by  starting  in  state  1  and  building 
state  transitions,  character-by-character,  until  the  end  of 
the  lexeme.   If  some  of  the  state  transitions  were  built 
previously,  they  are  left  intact.   Figure  10  shows  an  example 
of  states  being  built  for  the  lexemes  "BE"  and  "BEGIN." 
"BEGIN"  would  have  been  processed  first  since  it  is  longer. 
The  part  of  the  table  for  "BE"  would  use  the  previously-built 
states  and  transitions. 

It  is  recommended  that  no  lexeme  begin  with  a  blank, 
since  if  one  does,  every  blank  in  an  input  program  must  be 
checked  to  see  whether  it  is  the  start  of  that  lexeme.   This 
greatly  degrades  scanner  performance.   Each  special  token 
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Sl8nal 


^nfirm 


Figure    10.       Finite    State  Machine    States    for    "BE"    and    "BEGIN" 
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(identifier,  literal,  etc.)  causes  the  construction  of  special 
states.   Literals,  comments,  and  digit  strings  each  start  with 
a  character  unique  among  all  lexemes.   When  this  character  is 
found  in  scanner  state  1,  the  machine  jumps  to  the  appropriate 
state. 

There  are  two  special  identifier  states.   One  is  an 
identifier  prefix  state  which  checks  to  see  whether  a  "pos- 
sible" lexeme  is  really  that  symbol,  or  is  really  an  identi- 
fier.  The  other  state  causes  the  machine  to  loop  until  the 
end  of  an  identifier  and  then  reports  it  has  found  an 
identifier,  and  enters  that  identifier  in  the  name  table. 

The  states  in  the  table  which  represent  DIGITS, 
LITERAL,  and  comments  are  constructed  in  the  obvious  manner, 
based  on  the  definition  of  each. 

(5)  PL/I  Code  Generation 

The  Linker  generates  the  INIT  routine  from  user 
specifications  and  from  the  PL/I  code  given  in  the  user's 
compiler  description.   The  SEMANT  routine  is  also  generated. 
This  routine  is  called  by  the  parser  whenever  a  semantic 
action  is  to  be  done.   The  SEMANT  routine  then  calls  the 
appropriate  user-written  semantic  routine. 

(6)  Compiler  Link-Edit 

The  final  job  of  the  Linker  is  not  performed  by  the 
PL/I  coded  Linker  program  at  all.   The  linking  of  all  the 
object  code  for  the  compiler  is  done  by  the  system  linkage 
editor  after  the  INIT  and  SEMANT  routines  are  compiled. 
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5 . 5   Implementation  Description 

5.5.1  Segment  Generator  Virtual  Parse  Table 

The  representation  chosen  for  the  parse  table,  the 
GOTO  table,  and  two  other  tables  used  in  the  Segment  Genera- 
tor causes  them  to  get  very  large  for  large  grammars.   These 
tables  were  made  into  virtual  tables  to  enable  the  user  to 
select  any  main  memory  size  for  them. 

The  user  specifies  the  number  of  parse  states  which 
are  to  reside  in  main  memory  during  parse  table  generation 
for  each  table.   Each  of  the  virtual  tables  are  referenced 
through  a  virtual  table  manager  routine  which  first  checks  to 
see  whether  the  requested  state  is  in  memory.   If  it  isn't, 
the  current  block  of  states  is  written  to  secondary  storage 
and  the  correct  block  is  read  from  secondary  storage.   Then 
the  correct  offset  into  the  main  memory  block  is  computed  and 
returned. 

5.5.2  Parse  Table  Encoding  and  Compression 

The  parse  actions  from  the  parse  table  are  each  en- 
coded into  one  word  and  all  ERROR  actions  are  left  out  to 
compress  the  parse  table.   The  compressed  parse  table  takes 
the  form  of  a  parse  action  list  with  one  entry  per  parse 
action  and  an  index  with  one  entry  per  parse  state.   The 
entries  in  the  index  point  to  the  starting  place  in  the 
parse  action  list  of  the  entries  for  each  parse  state. 
Figure  11  shows  this  structure. 


State 
1 
2 


Non-Error 

Parse  Table 

Entries 
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Figure  11.   Compressed  Parse  Table  Structure  for  Each  Segment 


The  parse  actions  are  encoded  as  either  REDUCE  or 
SHIFT.   The  ACCEPT  action  is  treated  as  a  REDUCE  for  produc- 
tion 1. 

For  SHIFT,  the  terminal  symbol,  the  next  parse  state, 
and  the  semantic  action  (if  any)  are  encoded  into  one  word. 
The  word  is  set  negative  to  indicate  SHIFT. 

For  REDUCE,  the  terminal  symbol,  the  production 
number,  and  the  semantic  action  (if  any)  are  encoded  in  one 
word.   The  word  is  positive  for  REDUCE. 

5.5.3   The  Lexical  Scanner  Table  Encoding  and  Compression 
The  Genesis  lexical  scanner  table  has  a  unique  char- 
acter above  each  column.   A  row  represents  one  state  of  the 
machine.   The  letters  (A-Z),  the  special  characters  ("+", 
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"-",  "  =  ",  etc.),  and  the  digits  (0-9)  form  three  distinct 
classes  of  entries.   If  the  characters  in  these  classes 
occupy  adjacent  columns  in  the  table,  then  entries  are  fairly 
uniform  as  one  follows  a  row  across  the  table.   For  this 
reason,  the  method  of  encoding  the  table  involves  recording 
entries  in  a  list  only  when  they  change,  going  across  in  a 
row.   Therefore,  for  one  row  (state),  no  two  successive 
entries  in  the  compressed  table  are  the  same.   A  bit  map  is 
recorded  for  each  row,  showing  which  entries  in  the  original 
table  exist  in  the  compressed  table. 

An  index  points  to  the  place  in  the  entry  list  where 
a  state's  entries  begin.   The  first  entry  for  a  state  is 
always  taken  from  the  first  column  in  the  table.   Thereafter, 
entries  are  recorded  only  when  they  differ  from  the  last  one 
recorded.   The  bit  map  for  a  state  contains  one  bit  for  every 
column  in  the  table.   When  a  bit  is  set  to  1,  the  correspond- 
ing column  entry  exists  in  the  compressed  table.   The  first 
column  entry  always  exists  in  the  entry  list,  though  the 
bit  for  that  column  is  always  set  to  zero. 

The  algorithm  for  decoding  this  table  is  to  count  how 
many  bits  are  set  to  1  in  the  bit  map  from  bit  one  up  to  and 
including  the  bit  for  the  column  which  is  wanted  from  the 
table.   That  result  is  added  to  the  index  entry  for  the  proper 
state  to  get  to  the  index  of  the  requested  element  in  the 
entry  list.   Figure  12  shows  this  structure. 


ABC  D  E  F  G  II  IJ 
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original    table: 


state   x 
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compressed    table: 


BITMAPS 


POINTERS 


state   x 


0001100100 


SEQUENTIAL 
LIST 


Figure    12.      Compressed   Lexical    Scanner   Table    Structure 
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6.   FUTURE  IMPROVEMENTS 

Seven  areas  where  improvement  would  be  helpful  in 
Genesis  are  described  below: 

(1)  Building  the  Parse  Table 

While  building  the  SLR(l)  parse  table  in  the  Gener- 
ator, the  entire  table  need  not  be  represented  anywhere. 
Entries  only  need  be  allocated  when  they  are  being  built. 
Each  entry  of  the  table  could  contain  a  pointer  to  the  next 
entry  of  the  table  for  a  particular  state.   A  set  of  pointers 
could  point  to  the  first  entry  in  each  state. 

This  method  of  representing  the  parse  table  should 
save  a  significant  amount  of  memory  since  the  great  majority 
of  entries  in  an  SLR(l)  table  are  ERROR  entries,  which  needn't 
be  built  at  all  and  would  therefore  never  be  allocated. 

(2)  Calculating  FIRST  and  FOLLOW  Sets 

The  calculation  of  the  FIRST  and  FOLLOW  sets  is  done 
using  recursion  in  the  current  Genesis  system.   This  process 
is  simple,  but  expensive  in  both  time  and  space  requirements. 
If  a  bit  map  were  kept  for  each  set,  both  space  and  time 
usage  would  improve.   The  bit  maps  would  be  constructed  once, 
then  used  continually  throughout  the  building  of  the  SLR(l) 
table. 

To  calculate  the  FIRST,  set,  first  find  all  pro- 
ductions whose  right-hand  side  begins  with  terminal  symbol  and 
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mark  that  symbol  as  being  in  the  left-hand  side's  FIRST,  set. 
When  no  more  such  productions  exist,  then  a  dependency  graph 
must  be  built  for  nonterminals  in  the  grammar. 

For  .instance,  if  the  following  production  appears  in 
the  grammar: 

A  +  BCD 
then,  calculation  of  the  FIRST,  set  of  A  depends  on  the  cal- 
culations of  the  FIRST   set  of  B.   Thus, 


© >® 

depends 
on 


would  be  added  to  the  dependency  graph.   When  the  graph  is 
built,  it  is  checked  to  make  sure  that  it  has  no  cycles.   If 
it  has  no  cycles,  then  "starting  points"  can  be  found  from 
which  calculation  of  all  FIRST,  sets  can  be  completed. 


<S> 


Starting 
Point 


Qr^ 


After  the  dependency  graph  has  been  built,  all  arrows 
are  reversed.   A  "starting  point"  is  any  node  with  no  in- 
coming arrows.   The  FIRST1  sets  will  flow  in  the  direction  of 
the  arrows.   The  FIRST,  set  is  copied  from  one  node  to  the 
other  in  the  direction  of  the  reversed  arrow. 

Likewise,  to  calculate  the  FOLLOW.,  set,  all  produc- 
tions would  be  sequentially  searched.   Whenever  a  terminal 
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follows  any  'symbol,  that  terminal  is  added  to  the  symbol's 
FOLLOW,  set.   When  a  nonterminal  follows  a  symbol,  the  FIRST, 
set  of  that  nonterminal  is  added  to  the  FOLLOW,  set  of  the 
symbol . 

Next,  a  dependency  graph  must  be  constructed  for  the 
symbols  which  are  on  the  end  of  each  production.   For  example, 
if 

A  -*-  BCD 

■  y  &  ' ' . 

appears  as  a  production  in  the  grammar,  the  FOLLOW,  set  of  D 
depends  on  the  FOLLOW,  set  of  A,  so, 


© >© 

^ depends 


on 
would  be  added  to  the  dependency  graph.   After  the  dependency 
graph  is  built,  it  would  be  checked  to  make  sure  that  there 
are  no  cycles.   If  there  are  no  cycles,  starting  points  would 
be  chosen  and  the  FOLLOW,  set  calculation  would  continue  in 
the  same  manner  as  in  the  FIRST,  algorithm.   The  only  differ- 
ence between  the  two  graphs  is  that  the  FOLLOW,  set  graph 
includes  both  terminals  and  nonterminals,  while  the  FIRST, 
set  graph  only  includes  terminals. 

(3)   Parse  Table  Compression 

Many  techniques  for  the  compression  of  an  SLR(l)  table 
exist.   Several  are  mentioned  in  [Aho;  1973].   One  of  these 
techniques  could  substantially  reduce  the  size  of  the  parse 
table.   It  is  claimed  in  [Aho;  1973]  that  the  SLR(l)  table 
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can  be  compressed  to  the  point  where  it  is  competitive  with 
a  precedence  parse  table  in  size. 

(4)  Deterministic  Parsing 

Each  segment  which  is  generated  should  have  its 
seed-set  included  within  its  tables  so  that  the  Linker  can 
determine  whether  the  resulting  language  meets  the  Combin- 
ability  Criterion.   Then,  the  restriction  that  only  one  segment 
reference  can  appear  in  any  one  state  can  be  relaxed.   Restric- 
tion of  one  segment  per  state  is  the  present  method  of  ensur- 
ing a  deterministic  parse. 

(5)  Linker 

The  Linker  should  be  made  more  powerful.   If  it  were, 
it  could  alter  the  parse  tables  for  the  various  segments  and 
include  the  ENTER  and  EXIT  actions  at  the  proper  places  in 
the  table.   It  could  also  eliminate  the  need  for  the  terminal 
cross-reference  table,  the  segment  name  cross-reference  table, 
and  the  sets  of  segment  constants,  which  must  be  stored  in 
the  present  system,  by  standardizing  the  codes  for  all 
symbols . 

(6)  Production  Lengths 

The  table  of  production  lengths  which  is  currently 
passed  along  with  the  segments  should  be  eliminated.   Instead, 
the  length  information  should  be  encoded  in  the  parse  tables 
by  the  Segment  Generator. 
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(7)   Error  Recovery  Mechanism 

Some  sort  of  error  recovery  mechanism  should  be  in- 
cluded within  the  standard  parser  so  that  some  degree  of 
intelligent  recovery  from  an  error  is  possible.   This  is 
one  major  area  which  deserves  more  research.   It  may  be 
possible  to  specify  an  error  recovery  mechanism  within  the 
syntax  of  a  segment. 

The  error  recovery  notation  should  be  concise  and 
should  fit  into  the  normal  syntax  specification  in  a  natural 
way.   This  is  a  very  interesting  and  possibly  fertile  area 
for  research. 
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CLOSING  REMARKS 


The  Genesis  system  is  currently  being  used  at  the 
Illinois  State  Geological  Survey  in  the  development  of  the 
Retrieval  Request  Language  (RRL)  of  the  Mineral  Resources 
Evaluation  System.   Genesis  is  especially  well-suited  to  the 
special  environment  of  the  Computer  Services  Unit  of  the 
Survey.   The  programming  staff  is  small  and  always  has  much 
more  work  to  do  than  it  can  do.   Genesis  has  allowed  us  to 
work  on  RRL  in  short  bursts,  as  our  schedules  permit,  while 
still  accomplishing  something. 

RRL  was  divided  into  ten  segments.   We  have  been  able 
to  code  a  small  group  of  segments  at  one  time,  then  test  it. 
Since  segments  can  have  only  a  limited  interaction  with  each 
other,  debugging  is  simple.   With  the  additional  assistance 
of  a  semantic  trace,  we  have  been  able  to  quickly  isolate 
bugs  as  they  arise.   Thus  far  the  Genesis  system  has  proven 
to  be  a  useful  tool. 
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APPENDIX  A 


Genesis  Syntactic  Meta-Language  Syntax 


INPUT 

-> 

INITIAL  SEG; 

INITIAL 

->■ 

INITS; 

INITS 

-V 

INITS  INIT; 

-> 

INIT; 

INIT 

-> 

SETTABLE  '='  21  DIGITS; 

-> 

CHARCONST  '='  22  LITERAL; 

-> 

23 

'LISTPARSE' ; 

■+■ 

24 

'LISTFSM' 

SETTABLE 

-> 

1 

'MXPRODS1 

->- 

2 

'MXCORES'  | 

-V 

3 

'ACTNSIZ1 ( 

-> 

4 

•GOSIZ' ; 

-y 

5 

■NAMSIZ'; 

-*- 

6 

' LITSIZ' ; 

-*■ 

7 

'LISTSIZ' ; 

-+ 

8 

'MXRIGHT' ; 

•+• 

9 

'LITLEN' ; 

■+ 

10 

' NAMLEN ' ; 

-> 

11 

•MXPSTAT' ; 

-> 

12 

' ITEMPER' ; 

-> 

13 

'ACTSIZ' ; 

-> 

14 

•GOTOSIZ1 ; 

->■ 

15 

'GOSMSIZ1 ; 

-»■ 

16 

1 ITEMSIZ ' ; 

-> 

17 

'MXFENTRY' ; 

->■ 

18 

'MXFSTAT' ; 

-»■ 

19 

' NUMSIZ' ; 

-> 

20 

1 NUMLEN ' ; 

CHARCONST 

->■ 

25 

'NEWCHARS' ; 

■> 

26 

1 COMSTART ' ; 

-»- 

27 

*  COMEND '  ; 

# THE 

BNF  SYNTAX ; 

SEG 

-*- 

SYNT  SEMANT; 

SYNT 

-*• 

50 

'SYNTAX: '  PRODS; 

PRODS 

-> 

PRODS  PROD; 

-»■ 

PROD; 

PROD 

->- 

LHS 

'  +  '  RHS  59  '  ;  '  ; 

LHS 

-> 

51 

IDENTIFIEF 

t; 

52 


RHS 

RHSYMS; 

RHSYMS 

-► 

RHSYMS  RHSYM; 

->■ 

RHSYM; 

RHSYM 

■+■ 

53 

LITERAL; 

-*> 

54 

IDENTIFIER; 

■+■ 

55 

■<•; 

■* 

56 

•>'; 

-V 

57 

DIGITS; 

•+ 

58 

> 

SEMANT 

■+ 

60 

'SEMANTICS: 

SEMANTICS 

. 
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APPENDIX  B 


Compiler  Description  Language  Syntax 


SYNTAX : 

LINKINPUT 

INITS 


INITS  SEGMENTS  FSM  COMPSPEC  35; 

INITS  INIT; 
INIT; 


INIT 
SETTABLE 


-  SETTABLE  '  =  '  21  DIGITS; 

•  2  2  'MAXPSTATE' ; 
+23  'MAXSEGS' ; 
+24  'MAXPRODS' ; 
+25  'MAXTRMLN* ; 
+26  'MAXNAMLN' ; 
+27  'GOTOSIZ' ; 
+  28  "ACTSIZ  *  ; 
+  29  ' TRMSIZ' ; 
+  30  'NTRMSIZ ' ; 
+  40  'NAMSIZ ' ; 
+  41  'LITSIZ ' ; 

# SEGMENT  SPECIFICATION ; 

SEGMENTS   ->  MAJOR  MINOR; 

MAJOR      -y  32  T^IAJOR:  '  1  IDENTIFIER  •  »• 

MINOR      +  'MINOR:'  IDLIST  3  '.';        ' 
->- 

IDLIST     -,  IDLIST  ','2  IDENTIFIER; 
+  2  IDENTIFIER;  ■ 


# COMPILER  COMPOSITION ; 

ROri^S^1^?1^^  IDENTIFIER  34  ROUTINES  LISTING 


COMPSPEC 
ROUTINES 


ROUTINE 
LISTING 

OPTIONS 


+  ROUTINES  ' , '  ROUTINE; 

+  ROUTINE; 

+  4  IDENTIFIER  '('  5  IDENTIFIER  ')'• 

+  'LISTING:'  OPTIONS  '  '• 

+  ; 

+  OPTIONS  OPTION; 
+  OPTION; 


INITIAL; 
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OPTION 

■+ 

1  INDENT' 

_  i 

6  DIGITS; 

-> 

' RMARGIW 

■  _ 

•  7 

DIGITS 

-> 

•LMARGIN' 

i  _ 

'  8 

DIGITS 

-y 

'PSTAKLW 

i  _ 

*  9 

DIGITS 

■+ 

'NAMSIZ' 

=  ' 

50 

DIGITS 

-y 

•NAM  LEW 

=  ' 

51 

DIGITS 

-*■ 

•LITSIZ' 

=  ' 

52 

DIGITS 

■* 

'LITLEN' 

=  ' 

53 

DIGITS 

->• 

'NUMSIZ' 

= ' 

54 

DIGITS 

■+ 

'NUMLEN' 

=  ' 

5  5 

DIGITS 

-► 

'FAIL' 

=  • 

10 

DIGITS 

INITIAL    ■+  'INIT:'  IDECLARE  IBODY; 
IDECLARE   -»■  11  'DECLARE:' 

#CARD  IMAGES  OF  SOME  PL/I  DECLARES; 

IBODY      ■+  12  'BODY:  ' 

#CARD  IMAGES  OF  A  PL/I  INIT  ROUTINE; 


# FINITE  STATE  MACHINE  INITIALIZATIOS ; 

FSM        ■+  'FSM:1  INITSFSM; 

-V  • 

I 

INITSFSM   ■*  INITSFSM  INITFSM; 
■»■  INITFSM; 


INITFSM 

->• 

FSETTABLE  '=' 

13  DIGITS; 

->■ 

CHARCONST  '=' 

14  LITERAL; 

-> 

15 

'LISTFSM' ; 

FSETTABLE 

•+ 

16 

'MXFENTRY' 

i 

->■ 

17 

'MXFSTAT' ; 

CHARCONST 

-y 

18 

'NEWCHARS' 

i 

-V 

19 

1 COMSTART ' 

/ 

->- 

20 

'COMEND' ; 

SEMANTICS 
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APPENDIX  C 


SLR(l)  CONSTRUCTION  PERFORMED  ON  THE  EXAMPLE  SEGMENTS 


Augmented  Grammars: 


ASSIGN  ■ 
ASSIGN$ 


->   ASSIGN$; 


->   NAME 


EXPR; 


EXPR 

EXPR$ 

EXPR$ 

EXPR$ 

EXPR$ 


"*  EXPR$; 

-^  EXPR$ 

->  EXPR$ 

->  NAME  ; 

■>  DIGITS; 


i  *  i 


•  4-   < 


EXPR$ ; 
EXPR$; 


NAME 
NAME$ 
SUBSCR 
SUBSCR 


->  NAME$  ; 

-4  IDENTIFIER 

-»   '  (  '  EXPR 


SUBSCR; 


)  '; 


62 


SLR(l)  CONSTRUCTION 


ASSIGN 


STATE 


0 


[ASSIGN 


ASSIGN$] 


on  ASSIGN$  goto  STATE  2 
[ASSIGN$  *.  NAME    '  =  '    EXPR; 


on  NAME  goto  STATE  3 


STATE 


© 


[ASSIGN >   ASSIGN$. ] 


on  EOF  REDUCE 


STATE 


© 


[ASSIGN$ — *  NAME 


EXPR 


on  '  =  '  SHIFT  and  goto  STATE  4 


STATE  (T\ 

[ASSIGNS — »  NAME     '  =  '. 

EXPR] 

on  EXPR  goto  STATE  5 

-- 

STATE 


© 


[ASSIGN$— >  NAME 


EXPR, 


on  EOF  REDUCE 
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EXPR 


[EXPR$— »  .  EXPR$ 


STATE 


Q 


[EXPR )  .  EXPR$] 


on  EXPR$  goto  STATE  2 


[EXPR$ — ).  EXPR$    •*»    EXPR$] 


on  EXPR$  goto  STATE  2 


EXPR$] 


on  EXPR$  goto  STATE  2 


[EXPR$— >  .  NAME] 


on  NAME  goto  STATE  3 


[EXPR$ >  .  DIGITS] 


on  DIGITS  SHIFT  and  goto  STATE  4 


STATE  (2 J 

[EXPR— *  EXPR$  .  ] 

on  EOF  REDUCE 
[EXPR$— >  EXPR$  .     '*'    EXPR$] 

on  '*'  SHIFT  and  goto  STATE  5 
[EXPR$— >  EXPR$  .     *  +  '    EXPR$] 

on  '+'  SHIFT  and  goto  STATE  6 

64 


[EXPR$ *  NAME  .  ] 

on  '+'  REDUCE 
on  •*'  REDUCE 
on  EOF  REDUCE 


STATE 


© 


STATE 


© 


[EXPR$ >  DIGITS  .  ] 

on  '+'  REDUCE 
on  '*'  REDUCE 
on  EOF  REDUCE 


STATE 


© 


[EXPR$ >  EXPR$ 


*  • 


EXPR$] 


on  EXPR$  goto  STATE  7 


[EXPR$ >.  EXPR$      '*•     EXPRS] 


on  EXPR$  goto  STATE  7 


[EXPR$ >.     EXPR$      '+'     EXPR$] 


on  EXPR$  goto  STATE  7 
[EXPR$ >  .  NAME] 

on  NAME  goto  STATE  3 

[EXPR$ *  .  DIGITS] 

on  DIGITS  SHIFT  and  goto  STATE  4 
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[EXPR$ >  EXPR$ 


+ 


STATE 


© 


.  EXPR$] 


on  EXPR$  goto  STATE  8 
[EXPR$ »  .  EXPR$     •*'     EXPR$] 

on  EXPR$  goto  STATE  8 
[EXPR$ >.  EXPR$      '  +  ■     EXPR$] 

on  EXPR$  goto  STATE  8 
[EXPR$ >  .  NAME] 

on  NAME  goto  STATE  3 
[EXPR$ — >  .  DIGITS] 

on  DIGITS  SHIFT  and  goto  STATE  4 


STATE 


© 


[EXPR$— 

-►  EXPR$ 

•  *  i 

EXPR$  . ] 

on  ■*' 

REDUCE 

on  •  +  ' 

REDUCE 

on  EOF 

REDUCE 

[EXPR$  — 

-»  EXPR$  . 

i  *  i 

EXPR$] 

on  '*' 

REDUCE 

[EXPR$  — 

-*  EXPR$  . 

'  +  ' 

EXPR$] 

on  '+' 

REDUCE 
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STATE 


0 


[EXPR$ 

EXPR$ 

'  +  ' 

EXPR$  . ] 

on  '*' 

SHIFT  and 

goto 

STATE  5 

on  '  +  ' 

REDUCE 

on  EOF 

REDUCE 

[EXPR$ 

EXPR$  . 

i  *  i 

EXPR$] 

on  •*' 

SHIFT  and 

goto 

STATE  5 

[EXPR$ 

EXPR$  . 

'  +  • 

EXPR$] 

on  •  +  • 

REDUCE 

NAME 
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STATE 


© 


[NAME *.  NAME$] 


on  NAME$  goto  STATE  2 
[NAME »  .  IDENTIFIER     SUBSCR] 


on  IDENTIFIER  SHIFT  and  goto  STATE  3 


[ NAME >  NAME$  . ] 


on  EOF  REDUCE 


STATE 


© 


STATE  (T) 

[NAME$  — 

-»  IDENTIFIER  .     SUBSCR] 

on  SUBSCR  goto  STATE  4 
[SUBSCR — >  .] 

on  EOF 
[SUBSCR- 

REDUCE 
— >  .  '  ('      EXPR  ' )  '  ] 

on  '  (' 

SHIFT  and  goto  STATE  5 

STATE  (T) 

[NAME$  — 

->  IDENTIFIER 

SUBSCR  . ] 

on  EOF 

REDUCE 
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STATE 


© 


[SUBSCR » ' ( '      .  EXPR 


)  'J 


on  EXPR  goto  STATE  6 


STATE 


© 


[SUBSCR 


EXPR 


')  '] 


on  ')'  SHIFT  and  goto  STATE  7 


STATE 


© 


SUBSCR 


EXPR 


')  '  ■] 


on  EOF  REDUCE 
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